19 research outputs found
Towards Generalising Neural Implicit Representations
Neural implicit representations have shown substantial improvements in
efficiently storing 3D data, when compared to conventional formats. However,
the focus of existing work has mainly been on storage and subsequent
reconstruction. In this work, we show that training neural representations for
reconstruction tasks alongside conventional tasks can produce more general
encodings that admit equal quality reconstructions to single task training,
whilst improving results on conventional tasks when compared to single task
encodings. We reformulate the semantic segmentation task, creating a more
representative task for implicit representation contexts, and through
multi-task experiments on reconstruction, classification, and segmentation,
show our approach learns feature rich encodings that admit equal performance
for each task
DSConv: Efficient Convolution Operator
Quantization is a popular way of increasing the speed and lowering the memory
usage of Convolution Neural Networks (CNNs). When labelled training data is
available, network weights and activations have successfully been quantized
down to 1-bit. The same cannot be said about the scenario when labelled
training data is not available, e.g. when quantizing a pre-trained model, where
current approaches show, at best, no loss of accuracy at 8-bit quantizations.
We introduce DSConv, a flexible quantized convolution operator that replaces
single-precision operations with their far less expensive integer counterparts,
while maintaining the probability distributions over both the kernel weights
and the outputs. We test our model as a plug-and-play replacement for standard
convolution on most popular neural network architectures, ResNet, DenseNet,
GoogLeNet, AlexNet and VGG-Net and demonstrate state-of-the-art results, with
less than 1% loss of accuracy, without retraining, using only 4-bit
quantization. We also show how a distillation-based adaptation stage with
unlabelled data can improve results even further
DFNet: Enhance Absolute Pose Regression with Direct Feature Matching
We introduce a camera relocalization pipeline that combines absolute pose
regression (APR) and direct feature matching. Existing photometric-based
methods have trouble on scenes with large photometric distortions, e.g. outdoor
environments. By incorporating an exposure-adaptive novel view synthesis, our
methods can successfully address the challenges. Moreover, by introducing
domain-invariant feature matching, our solution can improve pose regression
accuracy while using semi-supervised learning on unlabeled data. In particular,
the pipeline consists of two components, Novel View Synthesizer and FeatureNet
(DFNet). The former synthesizes novel views compensating for changes in
exposure and the latter regresses camera poses and extracts robust features
that bridge the domain gap between real images and synthetic ones. We show that
domain invariant feature matching effectively enhances camera pose estimation
both in indoor and outdoor scenes. Hence, our method achieves a
state-of-the-art accuracy by outperforming existing single-image APR methods by
as much as 56%, comparable to 3D structure-based methods
Dual-Resolution Correspondence Networks
We tackle the problem of establishing dense pixel-wise correspondences
between a pair of images. In this work, we introduce Dual-Resolution
Correspondence Networks (DRC-Net), to obtain pixel-wise correspondences in a
coarse-to-fine manner. DRC-Net extracts both coarse- and fine- resolution
feature maps. The coarse maps are used to produce a full but coarse 4D
correlation tensor, which is then refined by a learnable neighbourhood
consensus module. The fine-resolution feature maps are used to obtain the final
dense correspondences guided by the refined coarse 4D correlation tensor. The
selected coarse-resolution matching scores allow the fine-resolution features
to focus only on a limited number of possible matches with high confidence. In
this way, DRC-Net dramatically increases matching reliability and localisation
accuracy, while avoiding to apply the expensive 4D convolution kernels on
fine-resolution feature maps. We comprehensively evaluate our method on
large-scale public benchmarks including HPatches, InLoc, and Aachen Day-Night.
It achieves the state-of-the-art results on all of them
BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
Dense 3D reconstruction from a stream of depth images is the key to many
mixed reality and robotic applications. Although methods based on Truncated
Signed Distance Function (TSDF) Fusion have advanced the field over the years,
the TSDF volume representation is confronted with striking a balance between
the robustness to noisy measurements and maintaining the level of detail. We
present Bi-level Neural Volume Fusion (BNV-Fusion), which leverages recent
advances in neural implicit representations and neural rendering for dense 3D
reconstruction. In order to incrementally integrate new depth maps into a
global neural implicit representation, we propose a novel bi-level fusion
strategy that considers both efficiency and reconstruction quality by design.
We evaluate the proposed method on multiple datasets quantitatively and
qualitatively, demonstrating a significant improvement over existing methods.Comment: Accepted at CVPR 202
InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure
Volumetric models have become a popular representation for 3D scenes in
recent years. One breakthrough leading to their popularity was KinectFusion,
which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM
has since also been tackled with very similar approaches. Representing the
reconstruction volumetrically as a TSDF leads to most of the simplicity and
efficiency that can be achieved with GPU implementations of these systems.
However, this representation is memory-intensive and limits applicability to
small-scale reconstructions. Several avenues have been explored to overcome
this. With the aim of summarizing them and providing for a fast, flexible 3D
reconstruction pipeline, we propose a new, unifying framework called InfiniTAM.
The idea is that steps like camera tracking, scene representation and
integration of new data can easily be replaced and adapted to the user's needs.
This report describes the technical implementation details of InfiniTAM v3,
the third version of our InfiniTAM system. We have added various new features,
as well as making numerous enhancements to the low-level code that
significantly improve our camera tracking performance. The new features that we
expect to be of most interest are (i) a robust camera tracking module; (ii) an
implementation of Glocker et al.'s keyframe-based random ferns camera
relocaliser; (iii) a novel approach to globally-consistent TSDF-based
reconstruction, based on dividing the scene into rigid submaps and optimising
the relative poses between them; and (iv) an implementation of Keller et al.'s
surfel-based reconstruction approach.Comment: This article largely supersedes arxiv:1410.0925 (it describes version
3 of the InfiniTAM framework
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
Training a Neural Radiance Field (NeRF) without pre-computed camera poses is
challenging. Recent advances in this direction demonstrate the possibility of
jointly optimising a NeRF and camera poses in forward-facing scenes. However,
these methods still face difficulties during dramatic camera movement. We
tackle this challenging problem by incorporating undistorted monocular depth
priors. These priors are generated by correcting scale and shift parameters
during training, with which we are then able to constrain the relative poses
between consecutive frames. This constraint is achieved using our proposed
novel loss functions. Experiments on real-world indoor and outdoor scenes show
that our method can handle challenging camera trajectories and outperforms
existing methods in terms of novel view rendering quality and pose estimation
accuracy. Our project page is https://nope-nerf.active.vision
FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation
We present FlowNet3D++, a deep scene flow estimation network. Inspired by
classical methods, FlowNet3D++ incorporates geometric constraints in the form
of point-to-plane distance and angular alignment between individual vectors in
the flow field, into FlowNet3D. We demonstrate that the addition of these
geometric loss terms improves the previous state-of-art FlowNet3D accuracy from
57.85% to 63.43%. To further demonstrate the effectiveness of our geometric
constraints, we propose a benchmark for flow estimation on the task of dynamic
3D reconstruction, thus providing a more holistic and practical measure of
performance than the breakdown of individual metrics previously used to
evaluate scene flow. This is made possible through the contribution of a novel
pipeline to integrate point-based scene flow predictions into a global dense
volume. FlowNet3D++ achieves up to a 15.0% reduction in reconstruction error
over FlowNet3D, and up to a 35.2% improvement over KillingFusion alone. We will
release our scene flow estimation code later.Comment: Accepted in WACV 202